Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Agent Lightning already supports weight-level training (VERL) and beam-search prompt optimization (APO). This PR adds GEPA — an evolutionary prompt optimizer that fills an important gap: fast, inference-only prompt improvement that tracks per-example performance via a Pareto frontier.
Unlike APO's beam search, GEPA evolves prompt candidates through reflective mutations: it examines execution traces, identifies where prompts fall short on specific examples, and proposes targeted improvements. This makes it particularly effective for tasks with diverse failure modes, where a single "best prompt" metric can hide regressions on individual cases.
What's included
optimizer with AGL's async store, converts between AGL resources and GEPA candidates, and supports W&B experiment tracking out of the box.
optimizes the prompt across 57 scenarios. The example now supports Azure Entra ID, Azure API key, and plain OpenAI as backends (--provider flag or LLM_PROVIDER
env var), so contributors without Azure access can run it with just an OpenAI key.
How GEPA compares to the existing algorithms
GEPA is a good fit when you want to improve prompts without touching model weights, and you care about not regressing on specific inputs while improving overall performance.
Example W&B logs